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Pile foundations are vastly utilized in construction projects 
where their capacities (pile bearing capacity, PBC) should be 
determined in different stages of construction. A highly reliable 
and accurate prediction model can lead to many advantages, 
such as reducing the construction cost, shortening — the 
construction timeline, and providing safety construction. Hence, 
the aim of this study is the developments of statistical and 
artificial intelligence (AI) models for predicting bearing 
capacities of 141 piles. At the preliminary of the study, features 
or inputs of this study to predict PBC were selected trough 
simple regression analysis. Then, this study presents different 
kernels of support vector machine (SVM) technique, i.e., the 
dot, the radial basis function (RBF), the polynomial, the neural, 
and the ANOVA to predict the PBC. The aforementioned 
models were evaluated by several performance indices and_ their 
results were compared using a simple ranking system. The 
results showed that the SVM-RBF model is able to achieve the 
highest coefficient of determination, R2 values which are 0.967 
and 0.993 for training and testing stages, respectively. It is 
important to mention that a multiple regression model was also 
employed to predict PBC values. The other SVM kernels were 
provided a high degree of accuracy for estimating PBC, 
however, the SVM-RBF model is recommended to be used as a 
powerful, highly reliable, and simple solution for PBC 
prediction. 
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1. Introduction 


Pile foundation supports the essential constituent of the superstructure by transferring the overall 
load from the structure to beneath the soil or earth [1—3]. Pertaining to that information, the value 
of pile bearing capacity (PBC), which is delineated as the total load a pile can hold to support the 
superstructure, holds substantial significance in the design of pile foundations, whereby 
casualties and loss of property as a result of pile failure can be avoided [4-6]. In the past few 
decades, field test such as Static Load Test (SLT) and High-Strain Dynamic Testing (HSDT) are 
preferred to be conducted in relevant projects to determine the factors including the bearing 
capacity of the pile. However, it is impossible to carry the field tests on each pile due to their 
limitations such as time consuming and costly [3,7—9]. Since estimation of PBC is one of the hot 
topics in the of area geotechnical engineering [10], different methods in estimating PBC have 
been proposed by many researchers. Nevertheless, the model accuracy and consistency is always 
of prime importance and interest in such case. 


There are many parameters influencing the PBC in the real scenario which can be divided into 
three categories i.e., pile geometry, soil condition and field test setting [11-15]. These categories 
with their sub-factors are presented in Figure 1. Among all effective factors, the pile geometry 
group including the embedment length of pile beneath the soil, the soil type and the apparatus 
used in the field test are considered the most influential parameters in measuring/predicting the 
PBC. The whole available models in the area of PBC estimation can be categorized into 3 
general groups which are i) empirical/theoretical, ii) statistical and iii) artificial intelligence 
(AD/machine learning (ML). The models in the first group are developed based on the theory 
from previous researchers and also the laboratory test data. The calculation of PBC values using 
empirical/theoretical techniques e.g., the Terzaghi formula (Terzaghi 1943) and the Vesic formula 
[17], could not be enlightened models for developing a reliable predictive tool specially when a 
new data is available [18]. The reason(s) may refer to the fact that these methods are lengthy in 
calculation and there are many assumptions that need to be made. Other than 
empirical/theoretical models, statistical models were also used to perform a solution for the PBC 
prediction [19,20]. Typically, the statistical model is a developed mathematical equation from the 
relationship between the predictors (inputs which are more than one variable) and the outcome 
(output) variables. Although statistical models are good in terms of their simplicity and 
efficiency, the performance capacity of these models is low, especially when extreme values are 
found in the data [20]. The models also do not show robustness that can solve complex and 


nonlinear relationships [9]. 


Pile Geometry Soil Conditions Field Test Setting 


Fig. 1. The most important categories on the PBC prediction. 
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In this modern era, human cannot live without computer, programming, and the applications of 
computational-based models. With the needs and requirements from the society, technology is 
improving and advancing from days to days. In essence, AI/ML techniques are normally dealing 
with math, algorithm and a sense of creativity [14,21,22]. They have been efficiency applied in 
solving problems in various areas of engineering [23—29]. In the area of the PBC estimation, 
there are several published AI/ML works in literature. For instance, Pal and Deswal [30] and 
Momeni et al. [5] suggested a solution for solving PBC problem based on the Gaussian process 
regression (GPR) model and reported its successful application for predicting PBC values. In 
another study, Momeni et al. [31] conducted a study to propose a model on PBC prediction using 
a hybrid genetic algorithm (GA)-artificial neural network (ANN). They used a total number of 50 
data samples for their study and received an excellent level of performance capacity for their 
proposed model. In another interesting study, Harandizadeh et al. [32] made use the applications 
of improved neuro-fuzzy approaches in predicting the PBC and their model received a very low 
system error in forecasting PBC. Chen et al. [33] have developed several hybrid AI/ML models 
including neuro-genetic, neuro-imperialism, genetic programming (GP), and ANN to estimate 
PBC values. After evaluation these techniques, the GP model was scored the highest coefficient 
of determination (R’) value among all proposed models. It seems that AI/ML models are able to 
provide a new solution and at the same time highest level of accuracy among all three described 
groups in estimating PBC values. 


After reviewing, different kinds of AI/ML models in the area of PBC prediction, there is only a 
limited number of support vector machine (SVM) studies available for predicting pile capacity 
[20,34]. SVM is a ML method that has demonstrated very encouraging and excellent results in 
the geotechnical field such as liquefaction assessment [35], tunneling and underground space 
technology [36], dam, embankment and retaining wall [37,38], soft soil issues [39,40], rock 
strength issues [41] and blasting environmental issues [42]. In addition, to being a powerful 
modelling technique, SVM can be used to provide the user with advice regarding the variables 
lack in the training set database. Furthermore, SVM usually comes with different kernel 
functions such as linear, polynomial, sigmoid and radial basis function (RBF) that are able to 
simplify the complexity of nonlinear data. 


This study aims to evaluate the feasibility of SVM model with the use of different kernel 
functions to predict PBC values. To this end, various SVM kernels i.e., dot or linear, RBF, 
polynomial, neural, and ANOVA are used to solve the problem in hand. Then, these kernels are 
evaluated based on their performance capacities in predicting PBC values and the best SVM 
kernel is selected to introduce. 


2. Materials and methods 


2.1. SVM background 


Support vector machine (SVM) utilizes various kernel functions to reform the non-linear data 
sets by transforming the datasets from higher dimension to a lower dimension. Then, a separating 
hyperplane can be created in the central of the maximum margin separating the support vectors 
[43]. Sometimes, support vectors which are defined as the closest training points to the 
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hyperplane, can be more than two. Figure 2 depicts the geometric point of view of the entire 
input space divided by a hyperplane into 2 parts (i.e., +1 and -1). The hyperplane may appear 
whether in a line or surface form depending of the dimensional space of the support vectors [44]. 
The margin between the hyperplane and support vectors needs to be maximized by minimizing 
the w value. The margin is strongly dependent with the parameter C in SVM where C is known 
as a hypermeter in controlling the misclassifying training example. To identify the function is in 
positive or negative, the equation below can be used: 


y = f(x) = w@)+b (1) 


where inputs and output of the model are denoted as x and y, respectively, w is the weight vector 
of x, (@) is the feature mapped non-linear from the input space x, and the 5 infers the bias of the 
model. Hence, f(x) => / will be considered as positive examples and f(x) < —/ is the negative 
examples. 


©) - 
>I - 
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Vectors \ - 


Hyperplane Sey ~ (-) 


— 
Vectors 
Fig. 2. Schematic of the SVM. 
Kernel is a mathematical function that serves as a link bridge for non-linear function to linear 


one. Figure 3 displays the structure of kernel functions in transforming the data. In addition, the 
performance of SVM is greatly influenced by kernel functions. 


Non-linear Kemel 
Function Function 


Input Layer Output Layer 


Fig. 3. Typical structure of kernels function. 
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Often, data sets can be classified into two different cases, which are separable and non-separable. 
For separable case, the hyperplane can be easier drawn and straightforward. So, linear kernel 
function is commonly-used in the linear separable case. This is because linear kernel function is 
the simplest kernel function grant by the inner product or dot between the functions. In 
engineering problems, non-separable data is common and always appeared. Hence, other 
functions can be utilized in order to produce the hyperplane with maximum margin. For instance, 
RBF is the most favorable kernel used in the case when the relation between two attributes are 
non-linear. Besides, RBF provides more trustworthy results as it has higher capability in 
interpolation but once the extrapolation is in huge range, RBF becomes weak and not suitable. 
Moreover, neural kernel or also known as sigmoid, which has similar behavior like RBF for 
certain parameters is commonly-used to solve for non-separable data [45]. This neural kernel in 
SVM is a kind of multi-layer perception without hidden layer. In addition, the RBF and neural 
kernel functions may be influenced by the hyper-parameter, gamma. The main function of the 
gamma is to decide the curvature of the hyperplane in the decision boundary. Furthermore, 
polynomial kernel function is another commonly-used function where it represents the feature 
space over polynomials of the original variable. Lastly, ANOVA kernel function is the extend 
version of RBF function which is able to combine the RBF and laplacian formulations. Table 1 
shows the kernel formulas applied in this study. In this figure, ‘d’ is the polynomial degree while 
‘y’ is the Gamma value for RBF, neural and polynomial kernels, x; and x; are the vector inputs. 


Table 1 
Formulas for different kernel functions used in this study. 
Kernel Function Equation 
Linear G(x;,%;) = exp(-yDx; - x; D*) 
RBF G(x;,x;) = (-yxfx; + 1)" 
Neural G(x;,x;) = Tanh (-yxf x; + 1)! 
Polynomial G(Xj,%;) = oe Xj 
ANOVA G(x;,x;) = exp(-y(%;i - Xj) 


2.2. Case study and collected data 


The HSDT or commonly-known as pile driver analyzer (PDA) tests were conducted in 
Pekanbaru area, Indonesia (Figure 4). Pekanbaru city is the important city in Indonesia and it 
was declared as the capital of the Riau district in Sumatra Island. The population of Pekanbaru 
has recorded approximately 1 million in the year of 2014 with the increment of 3.5% per year 
from 1998. Rapid growth of economic needs infrastructure facilities and tower building to 
support human activities. With increasing the number of construction projects, the number of 
PDA tests must be also increased to check capacity of the piles used as foundations of super- 
structures. Therefore, in order to propose SVM models with various kernels for prediction of 
PBC, a number of 141 PDA tests were carried out in Pekanbaru, Indonesia. The tests were 
performed on the precast concrete piles. Figure 5 shows an example of PDA test using the pile 
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driving analyzer equipment with Control and Provisioning of Wireless Access Points (CAPWAP) 
software to analyze the PBC. Various parameters including pile set, S, pile diameter, D, pile 
length, L, drop weight, H, and ram weight, W, were measured for these 141 tests. Of course, their 
PBC values were recorded as the ultimate objective factor of this study to be predicted. As 
discussed in introduction section, the collected/measured variables are all important for 
estimating PBC values. Therefore, the authors decided to use D, L, H, S, and W as model 
predictors or inputs to forecast PBC values. In order to give a better view of the used data, Table 
2 lists 30 data samples comprising the input and output parameters out of the whole data (i.e., 
141 samples). The ranges of (226-600 mm), (3-48 m), (12-90 kN), (0.2-3 m), and (291-3680 kN) 
were used for D, L, W, H, and PBC, respectively, in the modelling of this study. 
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Fig. 4. Location of Pekanbaru, Indonesia. 


‘ 
Fig. 5. PDA test conducted in Pekanbaru, Indonesia. 
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Table 2 
A part of input and output variables used in the modelling. 
Inputs Output 
Sample Pile Pile Ram Drop Pile Bearing 
No. Diameter Length Weight Height Capacity 
(mm) (m) (KN) (m) (KN) 
1 282 8 12 1 555 
2 282 8 12 1 623 
3 282 3 12 1 536 
4 282 3 12 1 850 
5 282 3 12 1 648 
6 282 8 12 1 291 
7 282 11 13 1.5 1,572 
8 282 11 13 1.5 1,450 
9 282 13 13 1.5 854 
10 282 14 13 1.5 818 
11 282 14 13 1.5 980 
12 282 13 13 1.5 1,063 
13 395 28 35 1 1,341 
14 480 29 45 1 1,409 
15 480 29 45 1 2,200 
16 480 29 45 1 1,650 
17 226 10 13 1 1,058 
18 226 7 13 1 942 
19 226 11 13 1 774 
20 226 8 13 1 749 
21 226 8 13 1 780 
22 226 8 13 1 588 
23 226 8 13 1 707 
24 451 12 90 0.4 3,530 
25 306 17 90 0.3 2,790 
26 306 15 90 0.3 2,900 
27 451 23 90 0.4 3,430 
28 451 14 90 0.4 3,460 
29 226 17 25 0.4 780 
30 226 17 25 0.4 770 


2.3. Step-by-step overview of research 


The first point to begin in this paper is setting up the research goal which is to introduce an 
applicable AI/ML technique to forecast the PBC. In this case, SVM predictive model is 
considered as a high level of performance and the errors is targeted to be lesser than 10%. Then, 
the research continues with the reviewing of past related published studies by the experts. After 
reviewing plenty of papers regarding the predictive model for PBC, it was found that there is a 
lack of study using SVM model with different kernels in forecasting the PBC. Various kernel 
functions of SVM are able to simply the complex and non-linear relations between inputs and 
output variables. After identifying the study problem, a series of quantitative data was obtained 
from the PDA tests. After compiling the data, the filtration was performed using ‘outlier labeling 
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rule’ which is proposed by Hoaglin and Iglewicz [46] to check for missing data and outliers 
before analysis. The next step is related to input selection, which was done using simple 
regression analysis. The next objective of this paper is to propose SVM models with different 
kernels to forecast the PBC. The SVM was modelled using Rapidminer software, which is a 
user-friendly modelling software for researchers. Eventually, the capacity of PBC predicted by 
each model was tested using important performance indices and also a simple ranking system. 
Then, the best SVM kernel was selected and introduced as the most powerful one for prediction 
of the PBC. Figure 6 illustrates the research methodology procedures of this study. 


wv 


Artificial Intelligence Technique 
(SVM with Different Kernel Functions) 


Fig. 6. Methodology procedure flowchart. 


2.4. Performance index 


To identify the most precise model, different performance indices must be taken into account 
during the modelling and evaluation parts. After reviewing previous investigations, the authors 
decided to apply the R’, a20-index, root mean square error (RMSE), variance account for 
(VAF%), and mean absolute error (MAE) on the ML/AI results. The values of 1, 1, 0, 100% and 
0 are considered as the perfect values for these indices, respectively. The formulas of these 
indices are presented in Table 3. In this table, 1 refers to total number of database, O stands for 
the measured database, O’ indicates the predicted values of O, and O refers to mean value of O, 
m*° indicates the rate of experimental value/predicted value that lies between the range of 0.80 
to 1.20. These indices can be calculated for train and test phases. 
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Table 3 
Equations of performance indices used in this investigation. 
Index Equation 
1 n 
RMSE -) (0 — 0)? 
mr 
* Xi - 0" 
di(O — 0)? 
var(O — 0’) 
VAF (% 1 ————_—__] x 100 
) var (0) 
n 
1 
MAE —) 10 ~o'| 
n * 
i=1 
20 
a20-index i 
n 


3. PBC modelling 


3.1. Input selection 


It is important to mention that one of the shortcomings and disadvantages of ML/AI models is 
their limited practical application in different areas of engineering. We as engineers should 
always try to make them as simple as possible in practice for other researchers and designers. In 
this way, one of the possible options is related to the number of inputs that we need to give to the 
system. The level of complexity can be decreased by reducing the number of input parameters 
[47]. Another point is related to the fact that if a lower number of inputs are needed to collect, the 
process of data collection would be easier and faster compared with the situation in which we 
need to collect and have all inputs. Based on above discussion, the input or feature selection was 
conducted through simple regression analysis. To do this, different trend line functions including 
linear, exponential, power and logarithmic were used between predictors and the PBC and they 
were evaluated using R’. The results of these analyses are presented in Table 4. As shown in this 
table, there are a wide range of R’ for different predictors. It is obvious that parameters of D and 
W have a deep impact on PBC results. However, L and S showed the lowest influence on the 
system output because they received the lowest R° values. In this stage, in order to remove only 
one parameter among them, the previous investigations were again reviewed. Based on this 
review and considering the fact that pile geometry category has a stronger effect on PBC results 
compared to field test setting category, the authors decided to remove S from the predictors. 
Therefore, variables i.e., D, L, W and H were set as model inputs in this study to predict PBC 
values. In the following sub-section, SVM modeling process and steps will be described. 
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Table 4 
Summary of simple regression analysis for selecting input parameters. 
Parameter Trend Line Function Relationship R’ 
Linear PBC = 6.6942D - 418.42 0.354 
Exponential PBC = 395.42¢°°? 0.317 
Logarithmic PBC = 2538.5In(D) -12862 0.380 
Power PBC = 0.171S5D'*”* 0.357 
Linear PBC = -5.5717L + 1959.9 0.004 
Exponential PBC = 1549,2e'F4t 5E-06 
Logarithmic PBC = 121.42In(L)+ 1472.9 0.005 
Power PBC = 1015.2L°"" 0.016 
Linear PBC = -24.006S + 1962.3 0.004 
Exponential PBC = 1600.3e°"S 0.001 
Logarithmic PBC = 205.02In(S) + 1519.2 0.017 
Power PBC = 1138.48°'*° 0.038 
Linear PBC = 29.751W + 199.35 0.698 
Exponential PBC = 560.27e°°!®¥ 0.658 
Logarithmic PBC = 1105.7In(W) - 2394.3 0.579 
Power PBC = 104.45w?™ 0.576 
Linear PBC = -529.81H + 2160.6 0.049 
Exponential PBC = 1788.1e°78# 0.024 
Logarithmic PBC = -345.5In(H) + 1610.6 0.037 
Power PBC = 1431.6H-0.117 0.011 


3.2. SVM modelling 


The ultimate aim of this study is to introduce a new solution for prediction of PBC based on 
SVM and its different kernels. From the previous section, it was decided to use four input 
parameters of (H, W, L, and D) out of the collected variables which were S, H, W, L, and D. As 
mentioned before, the Rapidminer as an easy and fast software, was selected to conduct 
modeling of SVM with various kernels for PBC estimation. In AI/ML works, there is an 
important stage prior to modeling which is data division for purposes of development and 
assessment. For the purpose of model development, a portion of 80% was randomly selected 
from the whole 141 data samples while for the purpose of model assessment, another remaining 
portion (20%) of the data samples, was allocated. These divisions were performed based on 
reviewing the previous studies [48-50]. In the next stage, a SVM flowchart is created in the 
software which is shown in Figure 7. The order started with inserting the database for model 
developed. Once the database is inserted into the software, filter example is necessity but not a 
must. This step is to filter out the outliers such as non-numerical data, symbols which are not 
recognize by the system. Next, identifying the input parameters, outcome variable and the 
predicted variable can be specified in the set role. Then, all mentioned parameters should be 
connected to the SVM operator. In this SVM operator, the software enables us to choose the 
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kernel functions. Hence, in such case, 5 different kernel functions i.e., dot/linear, RBF, 
polynomial, neural, and ANOVA were selected in predicting the PBC values. Furthermore, the 
value of complexity index, C, the optimizer parameter, convergence epsilon and kernel degree 
for each kernel function need to be designed. In this study, the mentioned parameters were 
determined using trial-and-error with the aim of obtaining the highest performance prediction for 
each kernel. Table 5 presents the final values for the effective SVM parameters for each kernel. 
These models and their performance ability in predicting PBC values will be discussed later. 


Retrieve Book4 Set Role SVM Apply Model Performance 
if | Se 
inp | out b @ exa iB exa @ tra mod )) mod 9 lab @ tab per BS 
| Cc See q 9 % res 
on est unl mod per exa 
_ - | uel | } rms 


Filter Examples 
pci iil 


a by ea 


Fig. 7. Flowchart of setting up SVM in Rapidminer software for PBC prediction. 


Table 5 
The final values related to effecting SVM parameters for each kernel. 

Kernel Type Dot RBF Polynomial Neural ANOVA 
Complexity Constant, C 1.0E-5 5.0E-6 5.0E-5 5.0E-4 5.0E-4 
Convergence Epsilon 0.10 0.20 0.01 0.01 0.01 
L Positive 1.30 1.00 1.50 1.50 1.50 
L Negative 1.30 1.00 1.50 1.50 1.50 
Kernel Degree - - 2.00 - 3.00 
Kernel Gamma - 2.00 - - 4.00 
Kernel Parameter A - - - 0.01 - 
Kernel Parameter B - - - 0.01 - 


4. Results and discussion 


From the previous sections, it was found that using only four variable as inputs would be of more 
interest and applicability in practice. Therefore, the modelling was done using these four input 
parameters (H, W, D and L) to develop the best model in forecasting the PBC. The results has 
proved that the elimination of parameter, S, shows an insignificant deviation in the whole 
database. Different SVM kernels as predictive models were conducted to predict PBC values. 
Since SVM is a statistical-based technique, the authors decided to apply a linear multiple 
regression (LMR) model on the same training and testing portions for having a fair and logical 
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comparison. The results of models and their abilities in predicting PBC values are presented in 
Tables 5 and 6 where a rating approach proposed by Zorlu et al. [51], was applied on the same. 
In this rating system, the better models in terms of all performance insides will get the highest 
rates. As shown in Tables 6 and 7, SVM with RBF kernel scored a total rating of 58 (out of 60) 
while this rate was obtained as 40, 29, 22, 10 and 51 for LMR, Dot, Polynomial, Neural and 
ANOVA, respectively. As a result, RBF received the highest position among all six models in 
this research for prediction of PBC values. 


Table 6 
The obtained results of different SVM kernels together with LMR technique. 
Index Rating 
Grou Model 
r R RMSE as MAE a-20 R? RMSE es MAE _ a-20 
Oo oO 
LMR 0.8274 0.1198 79.14 0.0865 0.5929 4 4 3 4 5 
Dot 0.8241 0.1240 80.97 0.0901 0.5044 3 3 4 3 2 
RBF 0.9669 0.0530 96.63 0.0287 0.5309 6 6 6 6 4 
an one 0.7555 0.1441 ~—-60.79~—s «0.1081 —s«0.5133.s 2 2 2 2 4 
Neural 0.6434 0.2902 1.68 0.2533 0.1681 1 1 1 1 1 
ANOVA 0.8456 ~— 0.1135. 82.16 =: 0.0815. «0.6106 5 5 5 5 6 
LMR 0.8283 0.1136 82.50 0.0886 04643 4 4 4 4 4 
Dot 0.8125 0.1515 65.71 0.1194 0.2857 3 3 3 3 2 
RBF 0.9934 0.0235 99.27 0.0116 0.9286 6 6 6 6 6 
= eee 0.6974 0.1549 36.68 0.1244 0.3214 2 2 2 ) 3 
Neural 0.5840 0.2615 2.57 0.2170 02143 1 1 1 1 1 
ANOVA 0.8654 ~— 0.1040 «85.04. 0.0634. Ss«0.6071_—Ss Ss 5 5 5 5 
Table 7 
Ratings and positions of developed models. 
Model Rating me 
‘ Position 
Train Test Total 
LMR 20 20 40 3 
Dot 15 14 29 4 
RBF 28 30 58 1 
Polynomial 11 11 22 5 
Neural 5 5 10 6 
ANOVA 26 25 51 2 


Next, SVM with ANOVA kernel with rating of 51 is the second option in predicting PBC. Then, 
LMR obtained 40 points as total rating followed by the SVM with dot and polynomial kernels 
with their ratings of 29 and 22, respectively. Lastly, SVM with neural kernel is the least accurate 
model among all six developed models in forecasting the PBC as it scores only 10 points. 
Overall, two conclusions can be drawn from the analysed results. Firstly, the AI/ML model such 
as SVM with RBF kernel and SVM with ANOVA kermel have higher accuracy in terms of 
prediction of PBC compared to the statistical LMR model. Secondly, SVM with RBF kernel 
model using four input parameters has the most influential result among all. Therefore, a graph 
of predicted PBC using simplified SVM with RBF kernel model against the actual PBC is 
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developed for training and testing model in the Rapidminer software, which can be shown in 
Figures 8 and 9. In addition, a graph of difference between actual and predicted PBC values for 
testing set (28 data samples) is plotted using the best developed simplifies SVM with RBF kernel 
(Figure 10). These figures together with the obtained results of all models confirm that the RBF 
kernel of SVM is the best model applied in this study with the highest accuracy level and lowest 
system error. This model can be used for the same problem of PBC by other designers or 
engineers in the future. 
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Fig. 8. Graph of predicted PBC versus actual PBC for the training set of SVM-RBF. 
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Fig. 9. Graph of predicted PBC versus actual PBC for the testing set of SVM-RBF. 
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Fig. 10. Actual and predicted PBC values for 28 data samples of testing using SVM-RBF model. 


Referring back to the literature reviewed, in fact, the SVM-RBF kernel developed in this study 
has a higher R’ value than some of the predictive tools including the hybrid model such as ANN, 
general regression neural networks and combination of group method of data handling and fuzzy 
polynomial. Over the years, several SVM models were proposed in forecasting the PBC value. 
For instance, Pal and Deswal [20] investigated SVM as a potential model for prediction the static 
pile capacity using a database with 81 samples. They found a R’ value of 0.967 for the RBF 
kernel function. On the other side, Samui and Kim [52] utilized the SVM model in forecasting 
the PBC value using 28 pile datasets. As a result, a training performance of R? = 0.951 was 
achieved in their model. In addition, Kordjazi et al. [53] developed a SVM model to predict PBC 
using 108 data set samples. The SVM model from the research aforementioned incorporated with 
a radial basis kernel to show a highest correlation of coefficient of 0.945. Our study has two 
advantages compared to the mentioned studies. First, we managed to get a higher level of 
accuracy compared to them which is always of interest and importance in simulation studies. 
Second, we used a larger data samples compared to them which allow us to propose a model 
with higher level of generalization. It is an important point that researchers should be aware of it 
and tried to develop models which can cover a larger range of data. 


5. Limitations of study 


One of the limitations of this study is the finite database available in the industry. In developing 
AI/ML techniques, the available database plays an important role. Incomplete or insufficient 
database is the main obstacle in developing a high performance and accurate predictive model. 
The reason behind this limitation is that the preparation the database is time consuming and 
costly. The most popular test to obtain the input is the HSDT or commonly known as PDA test. 
The working procedure of this test is long and required a large number of workers. In addition, 
heavy machinery such as excavator and mobile crane are essential for the test meanwhile the test 
required lots of expensive equipment and technologies. 


The second limitation of this study is the proposed predictive model only applicable in the 
particular area or places that having the similar soil properties. The PBC may vary with the soil 
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properties. Moreover, every single part of the world has different kinds of soil with various soil 
properties in term of cohesion, friction angle and so on. Although the predictive model is site 
specific, the algorithm and research methodology has been discovered in this study so that the 
prediction can be done in an easy and quick manner. 


6. Conclusion 


With an idea of having a ML/AI solution which is easy and simple, a series of experimental 
works have been done in several construction sites. The aim was to determine pile capacity 
together with some important factors on it. In this study, the connections between these important 
parameters and pile capacity were done through statistical and SVM models. First, out of all five 
parameters (i.e., H, W, D, S and L), S as the least effective parameter on pile capacity, was 
removed and the rest were used for the modeling. Then, LMR as well as SVM with five different 
kernels models (i.e., dot, RBF, neural, polynomial, and ANOVA) were proposed to predict PBC 
values. To interpreting the first-rate model among those developed models on predicting the 
PBC, a rating system was used. The system ranked the performance indices and the highest rate 
value model is known as the best model. As a result, the cumulative rate values of 40, 29, 58, 22, 
10, and 51 were obtained for the LMR, SVM-dot, SVM-RBF, SVM-polynomial, SVM-neural, 
and SVM-ANOVA models, respectively. This shows the SVM with RBF kernel is the most 
successful model where the R’ value of 0.9669 and 0.9934 was obtained for training and testing 
sets, respectively. The R? value of 0.9934 shows that SVM is one of the best developed AI 
models in the area of PBC prediction. Besides, the findings of AI models are better than the 
statistical model is proven as well since the AI model such as SVM-RBF and SVM-ANOVA 
obtained higher rating values than the statistical LMR model. The proposed SVM-RBF is 
introduced as a powerful, easy to use and simple model to be used in construction industry for 
predicting PBC values with a high degree of accuracy. 
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